Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognition
نویسندگان
چکیده
Recently a new approach to incorporate deep neural networks (DNN) bottleneck features into HMM based acoustic models using generalized variable parameter HMMs (GVPHMMs) was proposed. As Gaussian component level polynomial interpolation is performed for each high dimensional DNN bottleneck feature vector at a frame level, conventional GVPHMMs are computationally expensive to use in recognition time. To handle this problem, several approaches were exploited in this paper to efficiently use DNN bottleneck features in GVP-HMMs, including model selection techniques to optimally reduce the polynomial degrees; an efficient GMM based bottleneck feature clustering scheme; more compact GVP-HMM trajectory modelling for model space tied linear transformations. These improvements gave a total of 16 time speed up in decoding time over conventional GVP-HMMs using a uniformly assigned polynomial degree. Significant error rate reductions of 15.6% relative were obtained over the baseline tandem HMM system on the secondary microphone channel condition of Aurora 4 task. Consistent improvements were also obtained on other subsets.
منابع مشابه
Deep neural network bottleneck features for generalized variable parameter HMMs
Recently deep neural networks (DNNs) have become increasingly popular for acoustic modelling in automatic speech recognition (ASR) systems. As the bottleneck features they produce are inherently discriminative and contain rich hidden factors that influence the surface acoustic realization, the standard approach is to augment the conventional acoustic features with the bottleneck features in a t...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملGeneralized variable parameter HMMs based acoustic-to-articulatory inversion
Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNNbased approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable ...
متن کاملFeature space generalized variable parameter HMMs for noise robust recognition
Handling variable ambient noise is a challenging task for automatic speech recognition (ASR) systems. To address this issue, multi-style training using speech data collected in diverse noise environments, noise adaptive training or uncertainty decoding techniques can be used. An alternative approach is to explicitly approximate the continuous trajectory of Gaussian component or model space line...
متن کاملGeneralized Variable Parameter HMMs for Noise Robust Speech Recognition
Handling variable ambient noise is a challenging task for automatic speech recognition (ASR) systems. To address this issue, multi-style, noise condition independent (CI) model training using speech data collected in diverse noise environments, or uncertainty decoding techniques can be used. An alternative approach is to explicitly approximate the continuous trajectory of Gaussian component mea...
متن کامل